Climate Change & Crop Production

How is climate change affecting global crop production? I will try to answer this question by using different analysis using the Food and Agriculture Organization of the United Nation (FAO) which can be found here http://www.fao.org/faostat/en/?#data/QC

import Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline
# Suppress warnings from pandas
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
from sklearn.metrics import precision_score ,recall_score, f1_score
from sklearn.metrics import accuracy_score
In [2]:
import chart_studio
username = 'omegamarkos' # your username
api_key = 'PUk2JYOCBrghPN2lfAay' # your api key - go to profile > settings > regenerate key
chart_studio.tools.set_credentials_file(username=username, api_key=api_key)
In [3]:
from plotly.offline import iplot, init_notebook_mode,download_plotlyjs
import plotly.graph_objects as go
import plotly.express as px
In [4]:
#use this code to be able to display all the output in the cell instead of only displaying the out put for the last one. see the next cell.
from IPython.core.interactiveshell import InteractiveShell  
InteractiveShell.ast_node_interactivity = "all"

Crop Production Visualization

In [5]:
df_prod=pd.read_csv('FAOSTAT_crop_production.csv') # Total yearly crop production in tonnes by country
In [6]:
df_prod.head()
Out[6]:
Domain Code Domain Area Code Area Element Code Element Item Code Item Year Code Year Unit Value Flag Flag Description
0 QC Crops 2 Afghanistan 5510 Production 44 Barley 1961 1961 tonnes 378000.0 NaN Official data
1 QC Crops 2 Afghanistan 5510 Production 44 Barley 1962 1962 tonnes 378000.0 NaN Official data
2 QC Crops 2 Afghanistan 5510 Production 44 Barley 1963 1963 tonnes 378000.0 NaN Official data
3 QC Crops 2 Afghanistan 5510 Production 44 Barley 1964 1964 tonnes 380000.0 NaN Official data
4 QC Crops 2 Afghanistan 5510 Production 44 Barley 1965 1965 tonnes 380000.0 NaN Official data
In [7]:
df_prod['Item'] = df_prod['Item'].str.replace('Rice, paddy', 'Rice') # rename the the item name rice to make it easier to access
In [8]:
df_prod.rename(columns={'Value':'prod'}, inplace=True) # rename the column value to prod to avoid confusion for later 
In [9]:
df_prod.Area.unique()
Out[9]:
array(['Afghanistan', 'Albania', 'Algeria', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belgium-Luxembourg', 'Belize', 'Benin',
       'Bhutan', 'Bolivia (Plurinational State of)',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi',
       'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada',
       'Central African Republic', 'Chad', 'Chile', 'China',
       'China, Hong Kong SAR', 'China, mainland',
       'China, Taiwan Province of', 'Colombia', 'Comoros', 'Congo',
       'Costa Rica', "Côte d'Ivoire", 'Croatia', 'Cuba', 'Cyprus',
       'Czechia', 'Czechoslovakia',
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia',
       'Ethiopia PDR', 'Fiji', 'Finland', 'France', 'French Guiana',
       'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece',
       'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guinea',
       'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary',
       'Iceland', 'India', 'Indonesia', 'Iran (Islamic Republic of)',
       'Iraq', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan',
       'Kazakhstan', 'Kenya', 'Kuwait', 'Kyrgyzstan',
       "Lao People's Democratic Republic", 'Latvia', 'Lebanon', 'Lesotho',
       'Liberia', 'Libya', 'Lithuania', 'Luxembourg', 'Madagascar',
       'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania',
       'Mauritius', 'Mexico', 'Micronesia (Federated States of)',
       'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique',
       'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Caledonia',
       'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'North Macedonia',
       'Norway', 'Oman', 'Pacific Islands Trust Territory', 'Pakistan',
       'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
       'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
       'Republic of Korea', 'Republic of Moldova', 'Réunion', 'Romania',
       'Russian Federation', 'Rwanda', 'Saint Lucia',
       'Saint Vincent and the Grenadines', 'Sao Tome and Principe',
       'Saudi Arabia', 'Senegal', 'Serbia', 'Serbia and Montenegro',
       'Sierra Leone', 'Slovakia', 'Slovenia', 'Solomon Islands',
       'Somalia', 'South Africa', 'South Sudan', 'Spain', 'Sri Lanka',
       'Sudan', 'Sudan (former)', 'Suriname', 'Sweden', 'Switzerland',
       'Syrian Arab Republic', 'Tajikistan', 'Thailand', 'Timor-Leste',
       'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan',
       'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom',
       'United Republic of Tanzania', 'United States of America',
       'Uruguay', 'USSR', 'Uzbekistan', 'Vanuatu',
       'Venezuela (Bolivarian Republic of)', 'Viet Nam', 'Western Sahara',
       'Yemen', 'Yugoslav SFR', 'Zambia', 'Zimbabwe'], dtype=object)

Who produces the most ?

In [10]:
df_prod_2018 = df_prod.query("Year == 2018")
df_prod_2018_wheat =df_prod_2018.query("Item == 'Wheat'")
df_prod_2018_wheat_top =df_prod_2018_wheat.nlargest(10,['prod'])
df_prod_2018_wheat_top.head() 
Out[10]:
Domain Code Domain Area Code Area Element Code Element Item Code Item Year Code Year Unit prod Flag Flag Description
7429 QC Crops 351 China 5510 Production 15 Wheat 2018 2018 tonnes 131447224.0 A Aggregate, may include official, semi-official...
7835 QC Crops 41 China, mainland 5510 Production 15 Wheat 2018 2018 tonnes 131440500.0 NaN Official data
16558 QC Crops 100 India 5510 Production 15 Wheat 2018 2018 tonnes 99700000.0 NaN Official data
29521 QC Crops 185 Russian Federation 5510 Production 15 Wheat 2018 2018 tonnes 72136149.0 NaN Official data
36591 QC Crops 231 United States of America 5510 Production 15 Wheat 2018 2018 tonnes 51286540.0 NaN Official data
In [11]:
fig = px.bar(df_prod_2018_wheat_top, y='prod', x='Area', text='prod').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide',title ='Top Wheat Producers')
fig.show()
In [12]:
#To open the plot on chart studio web
import chart_studio.plotly as py
py.plot(fig, filename = 'wheat producers', auto_open=True)
Out[12]:
'https://plotly.com/~omegamarkos/1/'
The top wheat producing countries are China, India & Russia
In [13]:
df_prod_2018_rice =df_prod_2018.query("Item == 'Rice'") # only rice data
df_prod_2018_rice_top =df_prod_2018_rice.nlargest(10,['prod']) #the top 10
In [14]:
fig = px.bar(df_prod_2018_rice_top, y='prod', x='Area', text='prod').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide', title ='Top Rice Producers')
fig.show()
The top rice producing countries are China, India & Indonesia
In [15]:
df_prod_2018_maize =df_prod_2018.query("Item == 'Maize'")
df_prod_2018_maize_top =df_prod_2018_maize.nlargest(10,['prod'])
In [16]:
fig = px.bar(df_prod_2018_maize_top, y='prod', x='Area', text='prod').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide',title ='Top Maize Producers')
fig.show()
In [17]:
# fig = px.bar(df_prod_rice, x="Area", y= "prod").update_xaxes(categoryorder="max descending")
# fig.show()
In [18]:
fig =  px.choropleth(df_prod,locations="Area", locationmode="country names",animation_frame="Year", animation_group="Area",
            color="prod",color_continuous_scale= 'YlGn' , hover_name="Area", title = 'Total crop production by Country').update_geos(
    resolution=50,
    showcoastlines=True, coastlinecolor="RebeccaPurple",
    showland=True, landcolor="LightGreen",
    showocean=True, oceancolor="LightBlue",
    showlakes=True, lakecolor="Blue",
    showrivers=True, rivercolor="Blue"
)
fig.show()

Crop yield per hectare Visualization

In [19]:
df_crop =pd.read_csv('FAOSTAT_data_main_crop_yield.csv') # crop yield per hectare
In [20]:
df_crop.head()
Out[20]:
Domain Code Domain Area Code Area Element Code Element Item Code Item Year Code Year Unit Value Flag Flag Description
0 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1961 1961 hg/ha 10800.0 Fc Calculated data
1 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1962 1962 hg/ha 10800.0 Fc Calculated data
2 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1963 1963 hg/ha 10800.0 Fc Calculated data
3 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1964 1964 hg/ha 10857.0 Fc Calculated data
4 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1965 1965 hg/ha 10857.0 Fc Calculated data
In [21]:
from pandas_profiling import ProfileReport
crop_profile = ProfileReport(df_crop)
crop_profile
Out[21]:

Overview

Dataset info

Number of variables 14
Number of observations 37322
Total Missing (%) 0.0%
Total size in memory 4.0 MiB
Average record size in memory 112.0 B

Variables types

Numeric 4
Categorical 4
Boolean 0
Date 0
Text (Unique) 0
Rejected 6
Unsupported 0

Warnings

  • Area has a high cardinality: 197 distinct values Warning
  • Domain has constant value Crops Rejected
  • Domain Code has constant value QC Rejected
  • Element has constant value Yield Rejected
  • Element Code has constant value 5419 Rejected
  • Unit has constant value hg/ha Rejected
  • Year is highly correlated with Year Code (ρ = 1) Rejected

Variables

Area
Categorical

Distinct count 197
Unique (%) 0.5%
Missing (%) 0.0%
Missing (n) 0
United Republic of Tanzania
 
348
Democratic Republic of the Congo
 
348
Hungary
 
348
Other values (194)
36278
Value Count Frequency (%)  
United Republic of Tanzania 348 0.9%
 
Democratic Republic of the Congo 348 0.9%
 
Hungary 348 0.9%
 
Morocco 348 0.9%
 
China 348 0.9%
 
Pakistan 348 0.9%
 
South Africa 348 0.9%
 
Australia 348 0.9%
 
United States of America 348 0.9%
 
Democratic People's Republic of Korea 348 0.9%
 
Other values (187) 33842 90.7%
 

Area Code
Numeric

Distinct count 197
Unique (%) 0.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 128.63
Minimum 1
Maximum 351
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 10
Q1 60
Median 129
Q3 194
95-th percentile 238
Maximum 351
Range 350
Interquartile range 134

Descriptive statistics

Standard deviation 76.352
Coef of variation 0.59358
Kurtosis -0.7286
Mean 128.63
MAD 64.613
Skewness 0.14709
Sum 4800706
Variance 5829.6
Memory size 291.7 KiB
Value Count Frequency (%)  
351 348 0.9%
 
103 348 0.9%
 
117 348 0.9%
 
215 348 0.9%
 
116 348 0.9%
 
114 348 0.9%
 
16 348 0.9%
 
143 348 0.9%
 
203 348 0.9%
 
202 348 0.9%
 
Other values (187) 33842 90.7%
 

Minimum 5 values

Value Count Frequency (%)  
1 81 0.2%
 
2 290 0.8%
 
3 239 0.6%
 
4 289 0.8%
 
7 251 0.7%
 

Maximum 5 values

Value Count Frequency (%)  
273 39 0.1%
 
276 35 0.1%
 
277 21 0.1%
 
299 75 0.2%
 
351 348 0.9%
 

Domain
Constant

This variable is constant and should be ignored for analysis

Constant value Crops

Domain Code
Constant

This variable is constant and should be ignored for analysis

Constant value QC

Element
Constant

This variable is constant and should be ignored for analysis

Constant value Yield

Element Code
Constant

This variable is constant and should be ignored for analysis

Constant value 5419

Flag
Categorical

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 6
Fc
37316
(Missing)
 
6
Value Count Frequency (%)  
Fc 37316 100.0%
 
(Missing) 6 0.0%
 

Flag Description
Categorical

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 6
Calculated data
37316
(Missing)
 
6
Value Count Frequency (%)  
Calculated data 37316 100.0%
 
(Missing) 6 0.0%
 

Item
Categorical

Distinct count 6
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Maize
8969
Rice, paddy
6714
Wheat
6400
Other values (3)
15239
Value Count Frequency (%)  
Maize 8969 24.0%
 
Rice, paddy 6714 18.0%
 
Wheat 6400 17.1%
 
Sorghum 5741 15.4%
 
Barley 5170 13.9%
 
Millet 4328 11.6%
 

Item Code
Numeric

Distinct count 6
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 48.91
Minimum 15
Maximum 83
Zeros (%) 0.0%

Quantile statistics

Minimum 15
5-th percentile 15
Q1 27
Median 56
Q3 79
95-th percentile 83
Maximum 83
Range 68
Interquartile range 52

Descriptive statistics

Standard deviation 24.14
Coef of variation 0.49355
Kurtosis -1.3164
Mean 48.91
MAD 20.874
Skewness 0.052831
Sum 1825437
Variance 582.73
Memory size 291.7 KiB
Value Count Frequency (%)  
56 8969 24.0%
 
27 6714 18.0%
 
15 6400 17.1%
 
83 5741 15.4%
 
44 5170 13.9%
 
79 4328 11.6%
 

Minimum 5 values

Value Count Frequency (%)  
15 6400 17.1%
 
27 6714 18.0%
 
44 5170 13.9%
 
56 8969 24.0%
 
79 4328 11.6%
 

Maximum 5 values

Value Count Frequency (%)  
27 6714 18.0%
 
44 5170 13.9%
 
56 8969 24.0%
 
79 4328 11.6%
 
83 5741 15.4%
 

Unit
Constant

This variable is constant and should be ignored for analysis

Constant value hg/ha

Value
Numeric

Distinct count 23596
Unique (%) 63.2%
Missing (%) 0.0%
Missing (n) 6
Infinite (%) 0.0%
Infinite (n) 0
Mean 24207
Minimum 0
Maximum 367620
Zeros (%) 0.0%

Quantile statistics

Minimum 0
5-th percentile 4981.5
Q1 10000
Median 17143
Q3 31453
95-th percentile 64484
Maximum 367620
Range 367620
Interquartile range 21453

Descriptive statistics

Standard deviation 22917
Coef of variation 0.9467
Kurtosis 29.002
Mean 24207
MAD 15381
Skewness 3.7772
Sum 903310000
Variance 525170000
Memory size 291.7 KiB
Value Count Frequency (%)  
10000.0 442 1.2%
 
20000.0 250 0.7%
 
25000.0 144 0.4%
 
15000.0 123 0.3%
 
30000.0 119 0.3%
 
12500.0 95 0.3%
 
5000.0 90 0.2%
 
6667.0 90 0.2%
 
8000.0 90 0.2%
 
13333.0 83 0.2%
 
Other values (23585) 35790 95.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 8 0.0%
 
200.0 1 0.0%
 
343.0 1 0.0%
 
353.0 1 0.0%
 
364.0 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
340977.0 1 0.0%
 
348767.0 1 0.0%
 
352913.0 1 0.0%
 
357202.0 1 0.0%
 
367619.0 1 0.0%
 

Year
Highly correlated

This variable is highly correlated with Year Code and should be ignored for analysis

Correlation 1

Year Code
Numeric

Distinct count 58
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 1990.8
Minimum 1961
Maximum 2018
Zeros (%) 0.0%

Quantile statistics

Minimum 1961
5-th percentile 1964
Q1 1976
Median 1992
Q3 2005
95-th percentile 2016
Maximum 2018
Range 57
Interquartile range 29

Descriptive statistics

Standard deviation 16.759
Coef of variation 0.0084181
Kurtosis -1.1944
Mean 1990.8
MAD 14.522
Skewness -0.111
Sum 74301704
Variance 280.87
Memory size 291.7 KiB
Value Count Frequency (%)  
2016 718 1.9%
 
2015 717 1.9%
 
2017 717 1.9%
 
2018 717 1.9%
 
2013 716 1.9%
 
2014 715 1.9%
 
2012 715 1.9%
 
2011 710 1.9%
 
2010 709 1.9%
 
2008 707 1.9%
 
Other values (48) 30181 80.9%
 

Minimum 5 values

Value Count Frequency (%)  
1961 582 1.6%
 
1962 582 1.6%
 
1963 582 1.6%
 
1964 583 1.6%
 
1965 583 1.6%
 

Maximum 5 values

Value Count Frequency (%)  
2014 715 1.9%
 
2015 717 1.9%
 
2016 718 1.9%
 
2017 717 1.9%
 
2018 717 1.9%
 

Correlations

Sample

Domain Code Domain Area Code Area Element Code Element Item Code Item Year Code Year Unit Value Flag Flag Description
0 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1961 1961 hg/ha 10800.0 Fc Calculated data
1 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1962 1962 hg/ha 10800.0 Fc Calculated data
2 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1963 1963 hg/ha 10800.0 Fc Calculated data
3 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1964 1964 hg/ha 10857.0 Fc Calculated data
4 QC Crops 2 Afghanistan 5419 Yield 44 Barley 1965 1965 hg/ha 10857.0 Fc Calculated data
In [22]:
# drop the missing values
df_crop= df_crop.dropna()
In [23]:
df_crop.columns
Out[23]:
Index(['Domain Code', 'Domain', 'Area Code', 'Area', 'Element Code', 'Element',
       'Item Code', 'Item', 'Year Code', 'Year', 'Unit', 'Value', 'Flag',
       'Flag Description'],
      dtype='object')
In [24]:
#change name to avoid confusion
df_crop.rename(columns={'Value':'yield'}, inplace=True)
In [25]:
df_crop.columns
Out[25]:
Index(['Domain Code', 'Domain', 'Area Code', 'Area', 'Element Code', 'Element',
       'Item Code', 'Item', 'Year Code', 'Year', 'Unit', 'yield', 'Flag',
       'Flag Description'],
      dtype='object')

We only need the following features from the data. the rest are either constant or irrelevant
[ 'Area Code', 'Area','Item Code', 'Item', 'Year', 'Value']

In [26]:
df_crop = df_crop[['Area Code', 'Area','Item Code', 'Item', 'Year', 'yield']]
df_crop.tail()
Out[26]:
Area Code Area Item Code Item Year yield
37317 181 Zimbabwe 15 Wheat 2014 22094.0
37318 181 Zimbabwe 15 Wheat 2015 20486.0
37319 181 Zimbabwe 15 Wheat 2016 19013.0
37320 181 Zimbabwe 15 Wheat 2017 17542.0
37321 181 Zimbabwe 15 Wheat 2018 20010.0
In [27]:
# def getFiltered(df,col):
#     df_sub=df.query("Item = {}".format(col))
#     df_sub_max = df_sub.groupby(['Year'], as_index = False)['Value'].max()
#     return df_sub_max  
In [28]:
df_crop['Item'] = df_crop['Item'].str.replace('Rice, paddy', 'Rice')
In [29]:
df_crop['Item'].unique()
Out[29]:
array(['Barley', 'Maize', 'Millet', 'Rice', 'Wheat', 'Sorghum'],
      dtype=object)

Do the top producing countries also have high yield?

In [30]:
df_crop_2018 = df_crop.query("Year == 2018")
df_crop_2018_rice =df_crop_2018.query("Item == 'Rice'") # only rice data
df_crop_2018_rice_top =df_crop_2018_rice.nlargest(10,['yield']) #the top 10
df_crop_2018_rice_top.head()
Out[30]:
Area Code Area Item Code Item Year yield
1782 10 Australia 27 Rice 2018 103860.0
11033 59 Egypt 27 Rice 2018 88265.0
34949 231 United States of America 27 Rice 2018 86211.0
35239 234 Uruguay 27 Rice 2018 85000.0
26494 170 Peru 27 Rice 2018 81240.0
In [31]:
fig = px.pie(df_crop_2018_rice_top, values='yield', names='Area',
             title='top high rice yield ',
             hover_data=['yield'], labels={'yield':' crop yield'}).update_traces(textposition='inside', textinfo='percent+label')
fig.show()
In [32]:
fig = px.bar(df_crop_2018_rice_top, y='yield', x='Area', text='yield').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide',title ='Top Rice Yield')
fig.show()
In [33]:
df_crop_2018_wheat =df_crop_2018.query("Item == 'Wheat'") # only rice data
df_crop_2018_wheat_top =df_crop_2018_wheat.nlargest(10,['yield'])
In [34]:
fig = px.bar(df_crop_2018_wheat_top, y='yield', x='Area', text='yield').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide',title ='Top Wheat Yield')
fig.show()
In [35]:
df_crop_2018_maize =df_crop_2018.query("Item == 'Maize'") # only rice data
df_crop_2018_maize_top =df_crop_2018_maize.nlargest(10,['yield'])
In [36]:
fig = px.bar(df_crop_2018_maize_top, y='yield', x='Area', text='yield').update_traces(texttemplate='%{text:.2s}', textposition='outside').update_layout(uniformtext_minsize=8, uniformtext_mode='hide',title ='Top Maize Yield')
fig.show()

Crop & Region merged

In [37]:
df_reg =pd.read_csv('regional_code.csv') #  regions for the country data 
In [38]:
df_reg.head()
Out[38]:
Country Group Code Country Group Country Code Country M49 Code ISO2 Code ISO3 Code
0 5100 Africa 4 Algeria 12.0 DZ DZA
1 5100 Africa 7 Angola 24.0 AO AGO
2 5100 Africa 53 Benin 204.0 BJ BEN
3 5100 Africa 20 Botswana 72.0 BW BWA
4 5100 Africa 24 British Indian Ocean Territory 86.0 IO IOT
In [39]:
df_region = df_reg[['Country Group','Country Code']]
df_region.head()
Out[39]:
Country Group Country Code
0 Africa 4
1 Africa 7
2 Africa 53
3 Africa 20
4 Africa 24
In [40]:
df_region.isna().sum()
Out[40]:
Country Group    0
Country Code     0
dtype: int64
In [41]:
df_crop.head()
Out[41]:
Area Code Area Item Code Item Year yield
0 2 Afghanistan 44 Barley 1961 10800.0
1 2 Afghanistan 44 Barley 1962 10800.0
2 2 Afghanistan 44 Barley 1963 10800.0
3 2 Afghanistan 44 Barley 1964 10857.0
4 2 Afghanistan 44 Barley 1965 10857.0
In [42]:
df_crop_reg = pd.merge(df_crop, df_region, how='left', left_on=['Area Code'], right_on=['Country Code'])
df_crop_reg.head()
Out[42]:
Area Code Area Item Code Item Year yield Country Group Country Code
0 2 Afghanistan 44 Barley 1961 10800.0 Asia 2.0
1 2 Afghanistan 44 Barley 1961 10800.0 Central Asia and Southern Asia 2.0
2 2 Afghanistan 44 Barley 1961 10800.0 Land Locked Developing Countries 2.0
3 2 Afghanistan 44 Barley 1961 10800.0 Least Developed Countries 2.0
4 2 Afghanistan 44 Barley 1961 10800.0 Low income economies 2.0
In [43]:
# df_crop_reg_profile = ProfileReport(df_crop_reg)
# df_crop_reg_profile
In [44]:
df_crop_reg = df_crop_reg.dropna()
In [45]:
df_crop_reg_2017 = df_crop_reg.query("Year =='2017'")

Map with folium

In [46]:
import folium
In [47]:
url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
country_shapes = f'{url}/world-countries.json'
the_map = folium.Map()
the_map.choropleth(
    geo_data=country_shapes,
    name='choropleth',
    data=df_crop_reg_2017,
    columns=['Area', 'yield'],
    key_on='feature.properties.name',
    fill_color='Reds',
    nan_fill_color='white',
    fill_opacity=0.7,
    line_opacity=0.2,
)
folium.LayerControl().add_to(the_map)
the_map
Out[47]:
<folium.map.LayerControl at 0x259bfb0c160>
Out[47]:
In [48]:
df_crop_reg_2017_m = df_crop_reg_2017.query("Item =='Maize'")
In [49]:
url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
country_shapes = f'{url}/world-countries.json'
maize_map = folium.Map()
maize_map.choropleth(
    geo_data=country_shapes,
    name='choropleth',
    data=df_crop_reg_2017_m,
    columns=['Area', 'yield'],
    key_on='feature.properties.name',
    fill_color='Reds',
    nan_fill_color='white',
    fill_opacity=0.7,
    line_opacity=0.2,
)
folium.LayerControl().add_to(maize_map)
maize_map
Out[49]:
<folium.map.LayerControl at 0x259c3c80c50>
Out[49]:

Temperature anomaly visualization

In [50]:
df_temp = pd.read_csv('global_temp_change.csv')
In [51]:
df_temp.head()
Out[51]:
Domain Code Domain Area Code Area Element Code Element Months Code Months Year Code Year Unit Value Flag Flag Description
0 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1961 1961 °C 0.777 Fc Calculated data
1 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1962 1962 °C 0.062 Fc Calculated data
2 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1963 1963 °C 2.744 Fc Calculated data
3 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1964 1964 °C -5.232 Fc Calculated data
4 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1965 1965 °C 1.868 Fc Calculated data
In [52]:
#df_temp.Area.unique()
In [53]:
df_temp.columns
Out[53]:
Index(['Domain Code', 'Domain', 'Area Code', 'Area', 'Element Code', 'Element',
       'Months Code', 'Months', 'Year Code', 'Year', 'Unit', 'Value', 'Flag',
       'Flag Description'],
      dtype='object')
In [54]:
df_temp.rename(columns={'Value':'temperature'}, inplace =True)
In [55]:
df_temp.columns
Out[55]:
Index(['Domain Code', 'Domain', 'Area Code', 'Area', 'Element Code', 'Element',
       'Months Code', 'Months', 'Year Code', 'Year', 'Unit', 'temperature',
       'Flag', 'Flag Description'],
      dtype='object')

We only need the following features from the data. the rest are either constant or irrelevant
[ 'Area Code', 'Area','Months Code', 'Months', 'Year', 'Value']

In [56]:
df_temp = df_temp[[ 'Area Code', 'Area','Months Code', 'Months', 'Year', 'temperature']]
df_temp.head()
Out[56]:
Area Code Area Months Code Months Year temperature
0 2 Afghanistan 7001 January 1961 0.777
1 2 Afghanistan 7001 January 1962 0.062
2 2 Afghanistan 7001 January 1963 2.744
3 2 Afghanistan 7001 January 1964 -5.232
4 2 Afghanistan 7001 January 1965 1.868
In [57]:
df_temp.isna().sum()
Out[57]:
Area Code         0
Area              0
Months Code       0
Months            0
Year              0
temperature    7685
dtype: int64
In [58]:
df_temp=df_temp.dropna()
In [59]:
df_temp.Area.unique()
Out[59]:
array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda',
       'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria',
       'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belgium-Luxembourg', 'Belize', 'Benin',
       'Bhutan', 'Bolivia (Plurinational State of)',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei Darussalam', 'Bulgaria',
       'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Cayman Islands', 'Central African Republic', 'Chad',
       'Channel Islands', 'Chile', 'China', 'China, Hong Kong SAR',
       'China, Macao SAR', 'China, mainland', 'China, Taiwan Province of',
       'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia',
       'Comoros', 'Congo', 'Cook Islands', 'Costa Rica', "Côte d'Ivoire",
       'Croatia', 'Cuba', 'Cyprus', 'Czechia', 'Czechoslovakia',
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Eswatini', 'Ethiopia', 'Ethiopia PDR',
       'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland',
       'France', 'French Guiana', 'French Polynesia',
       'French Southern and Antarctic Territories', 'Gabon', 'Gambia',
       'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland',
       'Grenada', 'Guadeloupe', 'Guatemala', 'Guinea', 'Guinea-Bissau',
       'Guyana', 'Haiti', 'Holy See', 'Honduras', 'Hungary', 'Iceland',
       'India', 'Indonesia', 'Iran (Islamic Republic of)', 'Iraq',
       'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan',
       'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kuwait',
       'Kyrgyzstan', "Lao People's Democratic Republic", 'Latvia',
       'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein',
       'Lithuania', 'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia',
       'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique',
       'Mauritania', 'Mauritius', 'Mayotte', 'Mexico',
       'Micronesia (Federated States of)', 'Midway Island', 'Monaco',
       'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique',
       'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands',
       'Netherlands Antilles (former)', 'New Caledonia', 'New Zealand',
       'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island',
       'North Macedonia', 'Norway', 'Oman',
       'Pacific Islands Trust Territory', 'Pakistan', 'Palau',
       'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
       'Philippines', 'Pitcairn Islands', 'Poland', 'Portugal',
       'Puerto Rico', 'Qatar', 'Republic of Korea', 'Republic of Moldova',
       'Réunion', 'Romania', 'Russian Federation', 'Rwanda',
       'Saint Helena, Ascension and Tristan da Cunha',
       'Saint Kitts and Nevis', 'Saint Lucia',
       'Saint Pierre and Miquelon', 'Saint Vincent and the Grenadines',
       'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia',
       'Senegal', 'Serbia', 'Serbia and Montenegro', 'Seychelles',
       'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia',
       'Solomon Islands', 'Somalia', 'South Africa',
       'South Georgia and the South Sandwich Islands', 'South Sudan',
       'Spain', 'Sri Lanka', 'Sudan', 'Sudan (former)', 'Suriname',
       'Svalbard and Jan Mayen Islands', 'Sweden', 'Switzerland',
       'Syrian Arab Republic', 'Tajikistan', 'Thailand', 'Timor-Leste',
       'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia',
       'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu',
       'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom',
       'United Republic of Tanzania', 'United States of America',
       'United States Virgin Islands', 'Uruguay', 'USSR', 'Uzbekistan',
       'Vanuatu', 'Venezuela (Bolivarian Republic of)', 'Viet Nam',
       'Wake Island', 'Wallis and Futuna Islands', 'Western Sahara',
       'Yemen', 'Yugoslav SFR', 'Zambia', 'Zimbabwe'], dtype=object)
In [60]:
# create a positive anomaly column bc the animation doesn't work for negative values
df_temp['pos_temp'] = df_temp.temperature.apply(lambda x: 0 if x < 0  else x)
df_temp.head()
Out[60]:
Area Code Area Months Code Months Year temperature pos_temp
0 2 Afghanistan 7001 January 1961 0.777 0.777
1 2 Afghanistan 7001 January 1962 0.062 0.062
2 2 Afghanistan 7001 January 1963 2.744 2.744
3 2 Afghanistan 7001 January 1964 -5.232 0.000
4 2 Afghanistan 7001 January 1965 1.868 1.868
In [61]:
# taking only the yearly anomaly 
df_temp_metro = df_temp.query("Months == 'Meteorological year'") 
df_temp_metro.head()
Out[61]:
Area Code Area Months Code Months Year temperature pos_temp
944 2 Afghanistan 7020 Meteorological year 1961 -0.080 0.000
945 2 Afghanistan 7020 Meteorological year 1962 -0.120 0.000
946 2 Afghanistan 7020 Meteorological year 1963 0.882 0.882
947 2 Afghanistan 7020 Meteorological year 1964 -0.727 0.000
948 2 Afghanistan 7020 Meteorological year 1965 -0.200 0.000
In [62]:
#sortthe highest 20 anomalies 
df_temp_top_20 = df_temp_metro.nlargest(20,['temperature'])
df_temp_top_20
Out[62]:
Area Code Area Months Code Months Year temperature pos_temp
193813 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2016 5.413 5.413
193815 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2018 4.104 4.104
193803 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2006 3.886 3.886
193809 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2012 3.825 3.825
193811 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2014 3.382 3.382
193814 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2017 3.347 3.347
193812 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2015 3.307 3.307
193804 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2007 3.050 3.050
85024 85 Greenland 7020 Meteorological year 2010 3.042 3.042
34993 33 Canada 7020 Meteorological year 2010 2.917 2.917
193816 260 Svalbard and Jan Mayen Islands 7020 Meteorological year 2019 2.903 2.903
10504 1 Armenia 7020 Meteorological year 2018 2.853 2.853
133386 141 Mongolia 7020 Meteorological year 2007 2.768 2.768
194814 210 Sweden 7020 Meteorological year 2014 2.704 2.704
10496 1 Armenia 7020 Meteorological year 2010 2.703 2.703
59545 54 Denmark 7020 Meteorological year 2014 2.680 2.680
162196 173 Poland 7020 Meteorological year 2019 2.662 2.662
55996 167 Czechia 7020 Meteorological year 2018 2.659 2.659
18478 57 Belarus 7020 Meteorological year 2019 2.657 2.657
110575 118 Kuwait 7020 Meteorological year 2010 2.643 2.643
In [64]:
#plot the highest 20 anomaly
fig = px.scatter(df_temp_top_20, x="Year", y="temperature", size='temperature' ,title='Highest Temperature Anomaly since 1960')
fig.show()
The highest 20 temperature anomaly are after the year 2000. The year 2016 being the highest
In [65]:
# Animation by year
fig =  px.choropleth(df_temp_metro,locations="Area", locationmode="country names",animation_frame="Year", animation_group="Area",
            color="pos_temp",color_continuous_scale= 'reds' , hover_name="Area", title = 'Global Temperature Anomaly')

fig.show()
In [66]:
import plotly.io as pio
pio.write_html(fig, file='temp.html', auto_open=True)

Carbon dioxide(Co2) emission visualization

In [67]:
df_co2_all = pd.read_csv('WorldCo2TotalWorldBank.csv')
In [68]:
df_co2_all.head()
Out[68]:
Country Name Country Code Indicator Name Indicator Code 1970 1971 1972 1973 1974 1975 ... 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
0 Aruba ABW Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 42.306298 42.786948 43.286613 43.72459 44.130957 44.481908 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan AFG Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 14306.616300 14391.778200 13040.848000 13535.75410 14945.968200 14574.164000 ... 1.470988e+04 1.465437e+04 1.477026e+04 1.484916e+04 1.527263e+04 1.528878e+04 1.555443e+04 1.779133e+04 1.798131e+04 1.816886e+04
2 Angola AGO Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 60648.601080 39162.171430 42205.650700 44252.66912 40179.233500 44429.300030 ... 3.585801e+04 3.686023e+04 3.585574e+04 3.637582e+04 3.669053e+04 3.739629e+04 3.830947e+04 4.002939e+04 4.087865e+04 4.165716e+04
3 Albania ALB Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 6960.868200 6950.036100 7613.266100 7348.49210 7632.664300 7873.037000 ... 7.834016e+03 7.401102e+03 8.308386e+03 8.027150e+03 9.879394e+03 8.136785e+03 8.108812e+03 8.506996e+03 8.712296e+03 8.898641e+03
4 Arab World ARB Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 552362.062000 576946.152300 625015.943800 699280.21730 679448.005800 683966.235400 ... 1.963193e+06 1.660587e+06 1.767879e+06 1.731289e+06 2.355481e+06 2.364089e+06 2.387502e+06 1.924245e+06 2.483349e+06 1.908696e+06

5 rows × 47 columns

In [69]:
# Melt data from wide format to long format 
df_co2 = pd.melt(df_co2_all, id_vars=['Country Name','Country Code' ,'Indicator Name', 'Indicator Code'], var_name='year',value_name='Value')
df_co2.head()
Out[69]:
Country Name Country Code Indicator Name Indicator Code year Value
0 Aruba ABW Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 1970 42.306298
1 Afghanistan AFG Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 1970 14306.616300
2 Angola AGO Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 1970 60648.601080
3 Albania ALB Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 1970 6960.868200
4 Arab World ARB Total greenhouse gas emissions (kt of CO2 equi... EN.ATM.GHGT.KT.CE 1970 552362.062000
In [70]:
df_co2.isna().sum()
Out[70]:
Country Name        0
Country Code        0
Indicator Name      0
Indicator Code      0
year                0
Value             339
dtype: int64
In [74]:
#Fill the missing values using forward fill method
df_co2.fillna(method='ffill', inplace = True)
In [72]:
df_co2.isna().sum()
Out[72]:
Country Name        0
Country Code        0
Indicator Name      0
Indicator Code      0
year                0
Value             339
dtype: int64
In [73]:
df_co2= df_co2.drop(['Indicator Name','Indicator Code'], axis = 1)
df_co2.head()
Out[73]:
Country Name Country Code year Value
0 Aruba ABW 1970 42.306298
1 Afghanistan AFG 1970 14306.616300
2 Angola AGO 1970 60648.601080
3 Albania ALB 1970 6960.868200
4 Arab World ARB 1970 552362.062000
In [75]:
df_co2.rename (columns={'Country Name':'CountryName','Country Code':'CountryCode', 'Value':'co2'}, inplace= True)
In [76]:
df_co2['CountryName'].unique()
Out[76]:
array(['Aruba', 'Afghanistan', 'Angola', 'Albania', 'Arab World',
       'United Arab Emirates', 'Argentina', 'Armenia', 'American Samoa',
       'Australia', 'Austria', 'Azerbaijan', 'Burundi', 'Belgium',
       'Benin', 'Burkina Faso', 'Bangladesh', 'Bulgaria', 'Bahrain',
       'Bahamas, The', 'Bosnia and Herzegovina', 'Belarus', 'Belize',
       'Bermuda', 'Bolivia', 'Brazil', 'Barbados', 'Brunei Darussalam',
       'Bhutan', 'Botswana', 'Central African Republic', 'Canada',
       'Central Europe and the Baltics', 'Switzerland', 'Chile', 'China',
       "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica',
       'Caribbean small states', 'Cuba', 'Cayman Islands', 'Cyprus',
       'Czech Republic', 'Germany', 'Djibouti', 'Dominica', 'Denmark',
       'Dominican Republic', 'Algeria',
       'East Asia & Pacific (excluding high income)',
       'Early-demographic dividend', 'East Asia & Pacific',
       'Europe & Central Asia (excluding high income)',
       'Europe & Central Asia', 'Ecuador', 'Egypt, Arab Rep.',
       'Euro area', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia',
       'European Union', 'Fragile and conflict affected situations',
       'Finland', 'Fiji', 'France', 'Micronesia, Fed. Sts.', 'Gabon',
       'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia, The',
       'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Grenada',
       'Guatemala', 'Guam', 'Guyana', 'High income',
       'Hong Kong SAR, China', 'Honduras',
       'Heavily indebted poor countries (HIPC)', 'Croatia', 'Haiti',
       'Hungary', 'IBRD only', 'IDA & IBRD total', 'IDA total',
       'IDA blend', 'Indonesia', 'IDA only', 'India', 'Ireland',
       'Iran, Islamic Rep.', 'Iraq', 'Iceland', 'Israel', 'Italy',
       'Jamaica', 'Jordan', 'Japan', 'Kazakhstan', 'Kenya',
       'Kyrgyz Republic', 'Cambodia', 'St. Kitts and Nevis',
       'Korea, Rep.', 'Kuwait',
       'Latin America & Caribbean (excluding high income)', 'Lao PDR',
       'Lebanon', 'Liberia', 'Libya', 'St. Lucia',
       'Latin America & Caribbean',
       'Least developed countries: UN classification', 'Low income',
       'Sri Lanka', 'Lower middle income', 'Low & middle income',
       'Lesotho', 'Late-demographic dividend', 'Lithuania', 'Luxembourg',
       'Latvia', 'Macao SAR, China', 'Morocco', 'Moldova', 'Madagascar',
       'Maldives', 'Middle East & North Africa', 'Mexico',
       'Marshall Islands', 'Middle income', 'North Macedonia', 'Mali',
       'Malta', 'Myanmar',
       'Middle East & North Africa (excluding high income)', 'Mongolia',
       'Northern Mariana Islands', 'Mozambique', 'Mauritania',
       'Mauritius', 'Malawi', 'Malaysia', 'North America', 'Namibia',
       'Niger', 'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'Nepal',
       'New Zealand', 'OECD members', 'Oman', 'Other small states',
       'Pakistan', 'Panama', 'Peru', 'Philippines', 'Papua New Guinea',
       'Poland', 'Pre-demographic dividend', 'Puerto Rico',
       'Korea, Dem. People’s Rep.', 'Portugal', 'Paraguay',
       'Pacific island small states', 'Post-demographic dividend',
       'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'South Asia',
       'Saudi Arabia', 'Sudan', 'Senegal', 'Solomon Islands',
       'Sierra Leone', 'El Salvador', 'Somalia',
       'Sub-Saharan Africa (excluding high income)', 'Sub-Saharan Africa',
       'Small states', 'Sao Tome and Principe', 'Suriname',
       'Slovak Republic', 'Slovenia', 'Sweden', 'Eswatini', 'Seychelles',
       'Syrian Arab Republic', 'Turks and Caicos Islands', 'Chad',
       'East Asia & Pacific (IDA & IBRD countries)',
       'Europe & Central Asia (IDA & IBRD countries)', 'Togo', 'Thailand',
       'Tajikistan', 'Turkmenistan',
       'Latin America & the Caribbean (IDA & IBRD countries)',
       'Timor-Leste', 'Middle East & North Africa (IDA & IBRD countries)',
       'Tonga', 'South Asia (IDA & IBRD)',
       'Sub-Saharan Africa (IDA & IBRD countries)', 'Tunisia', 'Turkey',
       'Tuvalu', 'Tanzania', 'Uganda', 'Ukraine', 'Upper middle income',
       'Uruguay', 'United States', 'Uzbekistan',
       'St. Vincent and the Grenadines', 'Venezuela, RB',
       'British Virgin Islands', 'Virgin Islands (U.S.)', 'Vietnam',
       'Vanuatu', 'World', 'Samoa', 'Yemen, Rep.', 'South Africa',
       'Zambia', 'Zimbabwe'], dtype=object)
In [79]:
df_temp_metro.head()
Out[79]:
Area Code Area Months Code Months Year temperature pos_temp
944 2 Afghanistan 7020 Meteorological year 1961 -0.080 0.000
945 2 Afghanistan 7020 Meteorological year 1962 -0.120 0.000
946 2 Afghanistan 7020 Meteorological year 1963 0.882 0.882
947 2 Afghanistan 7020 Meteorological year 1964 -0.727 0.000
948 2 Afghanistan 7020 Meteorological year 1965 -0.200 0.000
In [80]:
df_temp_metro_world_max = df_temp_metro.groupby(['Year'], as_index=False)['temperature'].max()
df_temp_metro_world_max.tail()
Out[80]:
Year temperature
54 2015 3.307
55 2016 5.413
56 2017 3.347
57 2018 4.104
58 2019 2.903
In [81]:
df_co2_world = df_co2.query("CountryName == 'World'")
df_co2_world.head()
Out[81]:
CountryName CountryCode year co2
232 World WLD 1970 27660218.47
470 World WLD 1971 26263241.30
708 World WLD 1972 28148780.85
946 World WLD 1973 29107899.05
1184 World WLD 1974 28434027.14
In [82]:
df_co2_world.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 43 entries, 232 to 10228
Data columns (total 4 columns):
CountryName    43 non-null object
CountryCode    43 non-null object
year           43 non-null object
co2            43 non-null float64
dtypes: float64(1), object(3)
memory usage: 1.7+ KB
In [83]:
df_temp_metro_world_max.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns (total 2 columns):
Year           59 non-null int64
temperature    59 non-null float64
dtypes: float64(1), int64(1)
memory usage: 1.4 KB
In [84]:
df_temp_metro_world_max['Year']=df_temp_metro_world_max['Year'].astype(str)
In [85]:
df_temp_co2 = pd.merge(df_temp_metro_world_max, df_co2_world, how='left', left_on=['Year'], right_on=['year'])
df_temp_co2.head()
Out[85]:
Year temperature CountryName CountryCode year co2
0 1961 1.906 NaN NaN NaN NaN
1 1962 1.044 NaN NaN NaN NaN
2 1963 1.174 NaN NaN NaN NaN
3 1964 1.121 NaN NaN NaN NaN
4 1965 0.856 NaN NaN NaN NaN
In [86]:
df_temp_co2.dropna(inplace= True)
df_temp_co2.head()
Out[86]:
Year temperature CountryName CountryCode year co2
9 1970 0.982 World WLD 1970 27660218.47
10 1971 0.678 World WLD 1971 26263241.30
11 1972 2.386 World WLD 1972 28148780.85
12 1973 1.147 World WLD 1973 29107899.05
13 1974 1.597 World WLD 1974 28434027.14
In [87]:
import plotly.express as px

fig = px.bar(df_temp_co2, x='Year', y='co2',
             hover_data=['co2', 'temperature'], color='temperature', title= 'Global Co2 Emission & Temperature Anomaly', 
             labels={'co2':'World co2 emission'}, height=400)
fig.show()
py.plot(fig, filename = 'Co2ByTemperature', auto_open=True)
Out[87]:
'https://plotly.com/~omegamarkos/3/'
when the Co2 emission gets higher the earth gets warmer
In [88]:
fig = go.Figure()
trace = go.Scatter(x=df_co2_world["year"], y=df_co2["co2"], mode="lines+markers")
fig.add_trace(trace)

fig.update_layout(
    title={
        "text": "World Co2 Emission",
        "x":0.5,
        "xanchor": "center"
        },
    xaxis_title="Year",
    yaxis_title="co2 emission")
fig.show()

Merging the Temp & crop

In [89]:
df_crop_reg_sorted=df_crop_reg.sort_values(by=['Year'])
df_temp_sorted=df_temp.sort_values(by=['Year'])
In [90]:
df_crop_temp  = pd.merge_asof(df_temp_sorted, df_crop_reg_sorted,
              on='Year',
              by='Area')

df_crop_temp.head()
Out[90]:
Area Code_x Area Months Code Months Year temperature pos_temp Area Code_y Item Code Item yield Country Group Country Code
0 2 Afghanistan 7001 January 1961 0.777 0.777 2.0 44.0 Barley 10800.0 Southern Asia 2.0
1 25 Solomon Islands 7005 May 1961 0.233 0.233 25.0 27.0 Rice 17391.0 World 25.0
2 259 Channel Islands 7005 May 1961 0.227 0.227 NaN NaN NaN NaN NaN NaN
3 25 Solomon Islands 7006 June 1961 0.373 0.373 25.0 27.0 Rice 17391.0 World 25.0
4 259 Channel Islands 7004 April 1961 2.249 2.249 NaN NaN NaN NaN NaN NaN
In [91]:
df_crop_reg.head()
Out[91]:
Area Code Area Item Code Item Year yield Country Group Country Code
0 2 Afghanistan 44 Barley 1961 10800.0 Asia 2.0
1 2 Afghanistan 44 Barley 1961 10800.0 Central Asia and Southern Asia 2.0
2 2 Afghanistan 44 Barley 1961 10800.0 Land Locked Developing Countries 2.0
3 2 Afghanistan 44 Barley 1961 10800.0 Least Developed Countries 2.0
4 2 Afghanistan 44 Barley 1961 10800.0 Low income economies 2.0
In [92]:
df_crop_temp.isna().sum()
Out[92]:
Area Code_x          0
Area                 0
Months Code          0
Months               0
Year                 0
temperature          0
pos_temp             0
Area Code_y      48500
Item Code        48500
Item             48500
yield            48500
Country Group    48500
Country Code     48500
dtype: int64
In [93]:
df_crop_temp= df_crop_temp.dropna()
In [94]:
df_crop_temp.shape
Out[94]:
(169677, 13)
In [95]:
df_crop_temp.isna().sum()
Out[95]:
Area Code_x      0
Area             0
Months Code      0
Months           0
Year             0
temperature      0
pos_temp         0
Area Code_y      0
Item Code        0
Item             0
yield            0
Country Group    0
Country Code     0
dtype: int64
In [96]:
df_crop_temp.head()
Out[96]:
Area Code_x Area Months Code Months Year temperature pos_temp Area Code_y Item Code Item yield Country Group Country Code
0 2 Afghanistan 7001 January 1961 0.777 0.777 2.0 44.0 Barley 10800.0 Southern Asia 2.0
1 25 Solomon Islands 7005 May 1961 0.233 0.233 25.0 27.0 Rice 17391.0 World 25.0
3 25 Solomon Islands 7006 June 1961 0.373 0.373 25.0 27.0 Rice 17391.0 World 25.0
5 25 Solomon Islands 7007 July 1961 0.462 0.462 25.0 27.0 Rice 17391.0 World 25.0
7 25 Solomon Islands 7008 August 1961 0.211 0.211 25.0 27.0 Rice 17391.0 World 25.0
In [97]:
df_crop_temp.Months.unique()
Out[97]:
array(['January', 'May', 'June', 'July', 'August', 'September', 'October',
       'November', 'Sep–Oct–Nov', 'December', 'Jun–Jul–Aug',
       'Dec–Jan–Feb', 'Mar–Apr–May', 'Meteorological year', 'April',
       'March', 'February'], dtype=object)
In [98]:
df_crop_temp['temp']= df_crop_temp.temperature.apply(lambda x : 'low' if (x>=0 and x<=2) else 'high')     
df_crop_temp.head()
Out[98]:
Area Code_x Area Months Code Months Year temperature pos_temp Area Code_y Item Code Item yield Country Group Country Code temp
0 2 Afghanistan 7001 January 1961 0.777 0.777 2.0 44.0 Barley 10800.0 Southern Asia 2.0 low
1 25 Solomon Islands 7005 May 1961 0.233 0.233 25.0 27.0 Rice 17391.0 World 25.0 low
3 25 Solomon Islands 7006 June 1961 0.373 0.373 25.0 27.0 Rice 17391.0 World 25.0 low
5 25 Solomon Islands 7007 July 1961 0.462 0.462 25.0 27.0 Rice 17391.0 World 25.0 low
7 25 Solomon Islands 7008 August 1961 0.211 0.211 25.0 27.0 Rice 17391.0 World 25.0 low
In [99]:
df_crop_temp_re = df_crop_temp.groupby(['temp']).agg({'yield':'sum'}).reset_index()
df_crop_temp_re.head()
Out[99]:
temp yield
0 high 1.477619e+09
1 low 2.791647e+09

Plot by crop type

In [100]:
product =df_crop_temp['Item'].unique()
for prod in product:
     globals()[prod] = df_crop_temp.loc[df_crop_temp['Item'] == prod]
In [101]:
Maize_max =Maize.groupby(['Year'], as_index = False)['yield'].max()
Wheat_max =Wheat.groupby(['Year'], as_index = False)['yield'].max()
Rice_max =Rice.groupby(['Year'], as_index = False)['yield'].max()
In [102]:
import plotly.graph_objects as go
fig = go.Figure()
trace1 = go.Scatter(x=Wheat_max["Year"], y=Wheat_max["yield"], mode="lines+markers", name="Max Wheat")
fig.add_trace(trace1)
trace2 = go.Scatter(x=Rice_max["Year"], y=Rice_max["yield"], mode="lines+markers", name="Max Rice")
fig.add_trace(trace2)
trace3 = go.Scatter(x=Maize_max["Year"], y=Maize_max["yield"], mode="lines+markers", name="Max Maize")
fig.add_trace(trace3)
fig.update_layout(
    title={
        "text": "Wheat, Rice & Maize Maximumy yearly Production",
        "x":0.5,
        "xanchor": "center"
        },
    xaxis_title="Year",
    yaxis_title="Yield per Hectare")
fig.show()
In [103]:
df_crop_temp.rename(columns={"Country Group": "region"} , inplace = True)
df_crop_temp['region'].unique()
Out[103]:
array(['Southern Asia', 'World', 'Land Locked Developing Countries',
       'Americas', 'Sub-Saharan Africa (including Sudan)',
       'Net Food Importing Developing Countries', 'Sub-Saharan Africa',
       'Non-Annex I countries', 'Africa', 'Least Developed Countries',
       'Asia', 'Eastern Asia and South-eastern Asia',
       'Low Income Food Deficit Countries', 'Eastern Africa',
       'Latin America and the Caribbean', 'Annex I countries', 'OECD',
       'High-income economies', 'Europe', 'Lower-middle-income economies',
       'South America', 'Western Europe',
       'Small Island Developing States', 'Western Africa',
       'Northern America and Europe', 'Western Asia and Northern Africa',
       'Upper-middle-income economies', 'Caribbean',
       'Oceania excluding Australia and New Zealand', 'Northern Africa',
       'Central America', 'Northern Europe', 'Oceania',
       'Central Asia and Southern Asia', 'Western Asia',
       'Southern Europe', 'Southern Africa', 'European Union',
       'Eastern Asia', 'East Asia (exc China)', 'Middle Africa',
       'South Asia (exc India)', 'North Africa (exc Sudan)',
       'Low income economies', 'South-Eastern Asia', 'Melanesia',
       'Eastern Europe', 'Micronesia', 'Australia and New Zealand',
       'Northern America', 'Central Asia'], dtype=object)
In [104]:
df_crop_temp['region'] = df_crop_temp['region'].str.replace(' ','_')
df_crop_temp['region'].unique()
Out[104]:
array(['Southern_Asia', 'World', 'Land_Locked_Developing_Countries',
       'Americas', 'Sub-Saharan_Africa_(including_Sudan)',
       'Net_Food_Importing_Developing_Countries', 'Sub-Saharan_Africa',
       'Non-Annex_I_countries', 'Africa', 'Least_Developed_Countries',
       'Asia', 'Eastern_Asia_and_South-eastern_Asia',
       'Low_Income_Food_Deficit_Countries', 'Eastern_Africa',
       'Latin_America_and_the_Caribbean', 'Annex_I_countries', 'OECD',
       'High-income_economies', 'Europe', 'Lower-middle-income_economies',
       'South_America', 'Western_Europe',
       'Small_Island_Developing_States', 'Western_Africa',
       'Northern_America_and_Europe', 'Western_Asia_and_Northern_Africa',
       'Upper-middle-income_economies', 'Caribbean',
       'Oceania_excluding_Australia_and_New_Zealand', 'Northern_Africa',
       'Central_America', 'Northern_Europe', 'Oceania',
       'Central_Asia_and_Southern_Asia', 'Western_Asia',
       'Southern_Europe', 'Southern_Africa', 'European_Union',
       'Eastern_Asia', 'East_Asia_(exc_China)', 'Middle_Africa',
       'South_Asia_(exc_India)', 'North_Africa_(exc_Sudan)',
       'Low_income_economies', 'South-Eastern_Asia', 'Melanesia',
       'Eastern_Europe', 'Micronesia', 'Australia_and_New_Zealand',
       'Northern_America', 'Central_Asia'], dtype=object)
In [105]:
df_crop_temp['region'] = df_crop_temp['region'].str.replace('-','_')
df_crop_temp['region'].unique()
Out[105]:
array(['Southern_Asia', 'World', 'Land_Locked_Developing_Countries',
       'Americas', 'Sub_Saharan_Africa_(including_Sudan)',
       'Net_Food_Importing_Developing_Countries', 'Sub_Saharan_Africa',
       'Non_Annex_I_countries', 'Africa', 'Least_Developed_Countries',
       'Asia', 'Eastern_Asia_and_South_eastern_Asia',
       'Low_Income_Food_Deficit_Countries', 'Eastern_Africa',
       'Latin_America_and_the_Caribbean', 'Annex_I_countries', 'OECD',
       'High_income_economies', 'Europe', 'Lower_middle_income_economies',
       'South_America', 'Western_Europe',
       'Small_Island_Developing_States', 'Western_Africa',
       'Northern_America_and_Europe', 'Western_Asia_and_Northern_Africa',
       'Upper_middle_income_economies', 'Caribbean',
       'Oceania_excluding_Australia_and_New_Zealand', 'Northern_Africa',
       'Central_America', 'Northern_Europe', 'Oceania',
       'Central_Asia_and_Southern_Asia', 'Western_Asia',
       'Southern_Europe', 'Southern_Africa', 'European_Union',
       'Eastern_Asia', 'East_Asia_(exc_China)', 'Middle_Africa',
       'South_Asia_(exc_India)', 'North_Africa_(exc_Sudan)',
       'Low_income_economies', 'South_Eastern_Asia', 'Melanesia',
       'Eastern_Europe', 'Micronesia', 'Australia_and_New_Zealand',
       'Northern_America', 'Central_Asia'], dtype=object)
In [106]:
regions =df_crop_temp['region'].unique()
for reg in regions:
     globals()[reg] = df_crop_temp.loc[df_crop_temp['region'] == reg]
In [107]:
Africa.head()
Out[107]:
Area Code_x Area Months Code Months Year temperature pos_temp Area Code_y Item Code Item yield region Country Code temp
215 35 Cabo Verde 7011 November 1961 -0.542 0.000 35.0 56.0 Maize 6500.0 Africa 35.0 high
217 35 Cabo Verde 7010 October 1961 -0.184 0.000 35.0 56.0 Maize 6500.0 Africa 35.0 high
219 35 Cabo Verde 7009 September 1961 -0.416 0.000 35.0 56.0 Maize 6500.0 Africa 35.0 high
221 35 Cabo Verde 7008 August 1961 0.117 0.117 35.0 56.0 Maize 6500.0 Africa 35.0 low
223 35 Cabo Verde 7007 July 1961 0.012 0.012 35.0 56.0 Maize 6500.0 Africa 35.0 low
In [108]:
Africa_maize = Africa.query("Item=='Maize'")
Africa_maize_ave = Africa_maize.groupby(['Year'], as_index=False)['yield'].mean()
Africa_rice = Africa.query("Item=='Rice'")
Africa_rice_ave = Africa_rice.groupby(['Year'], as_index=False)['yield'].mean()
Africa_wheat = Africa.query("Item=='Wheat'")
Africa_wheat_ave = Africa_wheat.groupby(['Year'], as_index=False)['yield'].mean()
In [109]:
fig = go.Figure()
trace1 = go.Scatter(x=Africa_wheat_ave["Year"], y=Africa_wheat_max["yield"], mode="lines+markers", name="Africa Wheat")
fig.add_trace(trace1)
trace2 = go.Scatter(x=Africa_rice_ave["Year"], y=Africa_rice_max["yield"], mode="lines+markers", name="Africa Rice")
fig.add_trace(trace2)
trace3 = go.Scatter(x=Africa_maize_ave["Year"], y=Africa_maize_max["yield"], mode="lines+markers", name="Africa Maize")
fig.add_trace(trace3)
fig.update_layout(
    title={
        "text": "Average Yearly Crop Production in Africa",
        "x":0.5,
        "xanchor": "center"
        },
    xaxis_title="Year",
    yaxis_title="Yield per Hectare")
fig.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-109-c0a156dd8573> in <module>()
      1 fig = go.Figure()
----> 2 trace1 = go.Scatter(x=Africa_wheat_ave["Year"], y=Africa_wheat_max["yield"], mode="lines+markers", name="Africa Wheat")
      3 fig.add_trace(trace1)
      4 trace2 = go.Scatter(x=Africa_rice_ave["Year"], y=Africa_rice_max["yield"], mode="lines+markers", name="Africa Rice")
      5 fig.add_trace(trace2)

NameError: name 'Africa_wheat_max' is not defined
In [110]:
df_temp_metro_max =df_temp_metro.groupby(['Year'], as_index = False)['temperature'].max()
df_temp_metro_max.head()
Out[110]:
Year temperature
0 1961 1.906
1 1962 1.044
2 1963 1.174
3 1964 1.121
4 1965 0.856
In [111]:
df_temp_metro_max_2018 =df_temp_metro_max.query("Year == '2018'")
df_temp_metro_max_2017 =df_temp_metro_max.query("Year == '2017'")
df_temp_metro_max_2016 =df_temp_metro_max.query("Year == '2016'")
df_temp_metro_max_2015 =df_temp_metro_max.query("Year == '2015'")
In [112]:
# import plotly.graph_objects as go
# fig = go.Figure()
# trace1 = go.Scatter(x=df_temp_metro_max_2018["Months"], y=df_temp_metro_max_2018["temperature"], mode="lines+markers", name="2018")
# fig.add_trace(trace1)
# trace2 = go.Scatter(x=df_temp_metro_max_2017["Months"], y=df_temp_metro_max_2017["temperature"], mode="lines+markers", name="2017")
# fig.add_trace(trace2)
# trace3 = go.Scatter(x=df_temp_metro_max_2016["Months"], y=df_temp_metro_max_2016["temperature"], mode="lines+markers", name="2016")
# fig.add_trace(trace3)
# fig.update_layout(
#     title={
#         "text": "Max Temp anomaly ",
#         "x":0.5,
#         "xanchor": "center"
#         },
#     xaxis_title="Year",
#     yaxis_title="temp")
# fig.show()
In [113]:
Rice.head()
Out[113]:
Area Code_x Area Months Code Months Year temperature pos_temp Area Code_y Item Code Item yield Country Group Country Code temp
1 25 Solomon Islands 7005 May 1961 0.233 0.233 25.0 27.0 Rice 17391.0 World 25.0 low
3 25 Solomon Islands 7006 June 1961 0.373 0.373 25.0 27.0 Rice 17391.0 World 25.0 low
5 25 Solomon Islands 7007 July 1961 0.462 0.462 25.0 27.0 Rice 17391.0 World 25.0 low
7 25 Solomon Islands 7008 August 1961 0.211 0.211 25.0 27.0 Rice 17391.0 World 25.0 low
9 25 Solomon Islands 7009 September 1961 0.296 0.296 25.0 27.0 Rice 17391.0 World 25.0 low
In [114]:
df_cropr_max= Rice.groupby(['Year'], as_index = False)['yield'].max()
df_cropw_max= Wheat.groupby(['Year'], as_index = False)['yield'].max()
df_cropm_max= Maize.groupby(['Year'], as_index = False)['yield'].max()
In [115]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=df_cropw_max["Year"], y=df_cropw_max["yield"], mode="lines+markers", name="Max Wheat"),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=df_cropr_max["Year"], y=df_cropr_max["yield"], mode="lines+markers", name="Max Rice"),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=df_cropm_max["Year"], y=df_cropm_max["yield"], mode="lines+markers", name="Max Maize"),
    secondary_y=False,
)    
fig.add_trace(
    go.Scatter(x=df_temp_metro_max["Year"], y=df_temp_metro_max["temperature"], mode="lines+markers", name="Max Temp"),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="Max crop production & Maximum Temperature"
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="<b>crop production per Hectare</b> ", secondary_y=False)
fig.update_yaxes(title_text="<b>Temperature in celicious</b> ", secondary_y=True)

fig.show()
2016 is the warmest of all years .There was a pause between 1999 and 2005.

Merging temperature,co2 and crop data

In [116]:
df_co2.head()
Out[116]:
CountryName CountryCode year co2
0 Aruba ABW 1970 42.306298
1 Afghanistan AFG 1970 14306.616300
2 Angola AGO 1970 60648.601080
3 Albania ALB 1970 6960.868200
4 Arab World ARB 1970 552362.062000
In [117]:
df_co2.rename(columns={'CountryName' :'Area', 'year':'Year', 'Value': 'co2'}, inplace = True)
In [118]:
df_co2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10234 entries, 0 to 10233
Data columns (total 4 columns):
Area           10234 non-null object
CountryCode    10234 non-null object
Year           10234 non-null object
co2            10234 non-null float64
dtypes: float64(1), object(3)
memory usage: 319.9+ KB
In [119]:
df_co2['Year']= df_co2.Year.astype('int64')
df_co2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10234 entries, 0 to 10233
Data columns (total 4 columns):
Area           10234 non-null object
CountryCode    10234 non-null object
Year           10234 non-null int64
co2            10234 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 319.9+ KB
In [120]:
df_crop_temp.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 169677 entries, 0 to 218176
Data columns (total 14 columns):
Area Code_x     169677 non-null int64
Area            169677 non-null object
Months Code     169677 non-null int64
Months          169677 non-null object
Year            169677 non-null int64
temperature     169677 non-null float64
pos_temp        169677 non-null float64
Area Code_y     169677 non-null float64
Item Code       169677 non-null float64
Item            169677 non-null object
yield           169677 non-null float64
region          169677 non-null object
Country Code    169677 non-null float64
temp            169677 non-null object
dtypes: float64(6), int64(3), object(5)
memory usage: 19.4+ MB
In [121]:
df_crop_temp_sorted=df_crop_temp.sort_values(by=['Year'])
df_co2_sorted=df_co2.sort_values(by=['Year'])
In [122]:
df_crop_temp_co2  = pd.merge_asof(df_co2_sorted, df_crop_temp_sorted,
              on='Year',
              by='Area')

df_crop_temp_co2.head()
Out[122]:
Area CountryCode Year co2 Area Code_x Months Code Months temperature pos_temp Area Code_y Item Code Item yield region Country Code temp
0 Aruba ABW 1970 4.230630e+01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Mauritius MUS 1970 6.934602e+02 137.0 7008.0 August 0.293 0.293 137.0 27.0 Rice 34449.0 World 137.0 low
2 Malawi MWI 1970 6.200614e+03 130.0 7009.0 September 0.558 0.558 130.0 83.0 Sorghum 7170.0 Least_Developed_Countries 130.0 low
3 Malaysia MYS 1970 1.133742e+05 131.0 7007.0 July -0.115 0.000 131.0 56.0 Maize 21245.0 Upper_middle_income_economies 131.0 high
4 North America NAC 1970 5.895827e+06 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [123]:
df_crop_temp_co2 = df_crop_temp_co2.dropna()
In [124]:
df_co2.isna().sum()
Out[124]:
Area           0
CountryCode    0
Year           0
co2            0
dtype: int64
In [125]:
df_crop_temp_co2.head()
Out[125]:
Area CountryCode Year co2 Area Code_x Months Code Months temperature pos_temp Area Code_y Item Code Item yield region Country Code temp
1 Mauritius MUS 1970 693.46024 137.0 7008.0 August 0.293 0.293 137.0 27.0 Rice 34449.0 World 137.0 low
2 Malawi MWI 1970 6200.61420 130.0 7009.0 September 0.558 0.558 130.0 83.0 Sorghum 7170.0 Least_Developed_Countries 130.0 low
3 Malaysia MYS 1970 113374.23980 131.0 7007.0 July -0.115 0.000 131.0 56.0 Maize 21245.0 Upper_middle_income_economies 131.0 high
5 Namibia NAM 1970 7251.14550 147.0 7005.0 May 0.937 0.937 147.0 79.0 Millet 2273.0 World 147.0 low
6 Niger NER 1970 8021.28610 158.0 7008.0 August 0.187 0.187 158.0 79.0 Millet 3770.0 Low_Income_Food_Deficit_Countries 158.0 low
In [126]:
df_crop_temp_co2.region.unique()
Out[126]:
array(['World', 'Least_Developed_Countries',
       'Upper_middle_income_economies',
       'Low_Income_Food_Deficit_Countries', 'Sub_Saharan_Africa',
       'Annex_I_countries', 'OECD', 'Asia', 'Western_Africa',
       'Latin_America_and_the_Caribbean', 'Non_Annex_I_countries',
       'High_income_economies', 'Western_Asia', 'Africa',
       'South_Asia_(exc_India)', 'Central_America', 'South_Eastern_Asia',
       'North_Africa_(exc_Sudan)', 'Small_Island_Developing_States',
       'Lower_middle_income_economies',
       'Sub_Saharan_Africa_(including_Sudan)', 'Americas',
       'Net_Food_Importing_Developing_Countries', 'European_Union',
       'Europe', 'South_America', 'Land_Locked_Developing_Countries',
       'Western_Asia_and_Northern_Africa', 'Northern_America_and_Europe',
       'Oceania_excluding_Australia_and_New_Zealand', 'Southern_Asia',
       'Southern_Europe', 'Northern_Africa',
       'Eastern_Asia_and_South_eastern_Asia', 'Low_income_economies',
       'Northern_Europe', 'Eastern_Europe', 'Eastern_Africa', 'Caribbean',
       'Western_Europe', 'Central_Asia_and_Southern_Asia',
       'Southern_Africa', 'Melanesia', 'Eastern_Asia', 'Middle_Africa',
       'Oceania', 'Australia_and_New_Zealand', 'Central_Asia',
       'East_Asia_(exc_China)'], dtype=object)
In [127]:
df_crop_temp_co2_reg = df_crop_temp_co2.query( "region in ['World','Asia','Africa' ]")
df_crop_temp_co2_reg.head()
Out[127]:
Area CountryCode Year co2 Area Code_x Months Code Months temperature pos_temp Area Code_y Item Code Item yield region Country Code temp
1 Mauritius MUS 1970 693.46024 137.0 7008.0 August 0.293 0.293 137.0 27.0 Rice 34449.0 World 137.0 low
5 Namibia NAM 1970 7251.14550 147.0 7005.0 May 0.937 0.937 147.0 79.0 Millet 2273.0 World 147.0 low
8 Nicaragua NIC 1970 8313.10540 157.0 7001.0 January -0.071 0.000 157.0 27.0 Rice 31972.0 World 157.0 high
11 Nepal NPL 1970 19946.07520 149.0 7019.0 Sep–Oct–Nov 0.169 0.169 149.0 56.0 Maize 18695.0 Asia 149.0 low
16 Pakistan PAK 1970 87131.53790 165.0 7002.0 February 0.538 0.538 165.0 56.0 Maize 11208.0 World 165.0 low

Plot co2 , temperature and crop by region

In [128]:
fig = px.scatter(
    df_crop_temp_co2_reg, 
    x="temperature", 
    y="co2", 
    animation_frame="Year", 
    animation_group="Area",
    size="yield", 
    color="region", 
    hover_name="Area", 
    facet_col="region",
    size_max=45
#     range_y=[0,1000]
)
fig.show()
In [129]:
import plotly.express as px
fig = px.bar(df_crop_temp_co2_reg, x="region", y="co2",color='co2',
            animation_frame='Year', hover_name='Area')
fig.show()
In [130]:
df_crop_temp_co2.nlargest(5, ['co2'])
# df.nlargest(5, ['total exports'])
Out[130]:
Area CountryCode Year co2 Area Code_x Months Code Months temperature pos_temp Area Code_y Item Code Item yield region Country Code temp
7905 North Macedonia MKD 2003 22936477.92 154.0 7001.0 January 2.195 2.195 154.0 27.0 Rice 42551.0 Northern_America_and_Europe 154.0 high
6713 North Macedonia MKD 1998 22921216.59 154.0 7017.0 Mar–Apr–May 0.045 0.045 154.0 27.0 Rice 50723.0 World 154.0 low
7666 North Macedonia MKD 2002 22163474.75 154.0 7020.0 Meteorological year 0.420 0.420 154.0 44.0 Barley 26541.0 Europe 154.0 low
6952 North Macedonia MKD 1999 20319706.83 154.0 7012.0 December 1.204 1.204 154.0 27.0 Rice 41540.0 World 154.0 low
7428 North Macedonia MKD 2001 19756651.95 154.0 7010.0 October 2.934 2.934 154.0 15.0 Wheat 21298.0 Europe 154.0 high